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Abstract 

We  describe  how  to  take  a  stable,  ARM  A,  time  series  through  the 
various  stages  of  model  identification,  parameter  estimation,  and  diag¬ 
nostic  checking,  and  accompany  the  discussion  with  a  goodly  number  of 
large  scale  simulations  that  show  which  methods  do  and  do  not  work,  and 
where  some  of  the  pitfalls  and  problems  associated  with  stable  time  series 
modelling  lie. 
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1,  Introduction 


There  are  three  major  stages  in  the  now  standard  “Box- Jenkins”  time  series 
modelling  techniques  for  Gaussian  time  series:  Model  identification,  parameter 
estimation,  and  diagnostic  checking. 

In  many  ways,  the  techniques  behind  these  three  stages  really  only  involve 
two  bags  of  tricks,  since  most  diagnostic  checks  rely  on  testing  whether  or  not  the 
fitted  residuals,  after  parameter  estimation,  behave  like  a  white  noise  sequence. 
This,  of  course,  is  tantamount  to  identifying  a  model  for  the  residuals,  and  so 
takes  us,  more  or  less,  back  to  stage  one. 

In  this  paper  we  will  concentrate  on  a  variety  of  issues  related  to  ARMA 
model  identification  in  the  stable  setting.  The  paper  by  Calder  and  Davis,  in 
this  volume,  [CD],  describes  the  parameter  estimation  problem,  so  that  the  two 
papers,  together,  should  give  a  good  overview  of  the  overall  ARMA  problem 
and  be  of  some  assistance  to  a  practitioner  who  wishes  to  analyse  a  particular 
series.  We  shall  also  have  something  to  say  about  parameter  estimation,  for  one 
specific  technique. 

There  are  no  new  theorems  in  this  paper,  or  even  really  new  ways  of  thinking 
about  things.  Rather,  we  have  tried  to  collect,  in  one  place,  a  number  of  results 
that  are  rather  widely  scattered,  and  to  investigate  their  practical  efficiency  on 
replicates  of  synthetic  data.  Some  of  the  results  are  somewhat  surprising,  and 
some  more  than  a  little  worrying.  Many  beg  further,  and  deeper,  theoretical 
investigation. 

^Research  supported  in  part  by  Office  of  Naval  Research,  grants  N00014-94-1-0191,  N00014- 
93-1-0800  and  N00014-96-1-0739 
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The  bottom  line  will  be  that  while,  in  principle,  the  standard  Gaussian  Box- 
Jenkins  techniques  [BJ],  [BD]  do  carry  over  to  the  stable  setting,  in  practice  a 
great  deal  of  care  needs  to  be  exercised. 

Results  in  a  similar  vein  can  also  be  found  in  the  paper  [R2]  in  this  volume, 
as  well  as  [Rl]  and  [FR].  These  papers  treat  real  as  well  as  synthetic  data,  and 
general  heavy  tailed  rather  than  purely  stable  series. 

Finally,  before  we  start,  we  should  determine  precisely  what  we  mean  by 
the  various  stable  parameters  by  defining  a  stable  distribution  with  parameters 
(a,  cr,  fi)  as  usual  via  its  characteristic  functions,  as  follows: 

_  j  exp  (1  -  t/?(signi)tan  +  ifit}  if  a  ^  I, 

^  “  I  exp|-a|i|  (^1+ ^(signi)ln|fl^ if  a  =  1, 

where  0  <  a  <  2,  o’  >  0,  -1  <  <  1,  and  /i  €  5R.  We  shall  denote  such  a 

distribution  by  writing  Z  ^  Sa(cr,P,n)- 

2.  Preliminary  data  analysis  -  Is  it  stable? 


The  first  question  that  must  be  broached  is  whether  or  not  our  data  is 
“heavy  tailed”,  in  some  general  sense,  and,  if  so,  whether  or  not  it  is  stable.  We 
shall  not  be  interested  in  the  possibility  of  heavy  tailed,  but  non-stable  data, 
for  a  number  of  reasons; 

1.  If  the  data  is  in  the  domain  of  attraction  (cf.  [FE])  of  a  stable  distribution, 
then,  in  general,  large  sample  techniques  are  identical  to  those  for  the 
purely  stable  situation. 

2.  In  the  domain  of  attraction  case,  the  difference  between  a  stable  and  non¬ 
stable  model  hes  in  the  central  region  of  the  distribution.  If  one  is  using 
stable  or  other  heavy  tailed  techniques,  this  is  generally  not  the  region  of 
interest. 

3.  In  the  case  of  heavy  tails,  not  in  a  stable  domain  of  attraction,  there  are 
comparatively  few  reliable  techniques  around  (see  [Rl,  FR]  for  further 
details  and  discussion). 

We  shall  also  make  one  significant  simplification  throughout  this  paper;  We 
shall  virtually  always  work  with  examples  in  which,  in  terms  of  (1.1),  ^  ^  =  0; 

i.e.  with  centered  and  symmetric  variables.  This  simplification  is  common  in 
most  of  the  theory  that  we  quote,  although,  unfortunately,  it  is  not  always 
justified  in  practice.  However,  we  doubt  that  it  has  much  Quolitotivc  effect 
on  the  phenomena  we  shall  look  at.  That  this  is  definitely  the  case  in  some 
situations  is  born  out  by  [KN2]. 
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2.1  Graphing  the  series. 

The  simplest,  most  obvious,  and  often  most  powerful,  techniques  for  detecting 
stable  data  are  also,  unfortunately,  ones  with  very  little  theory  behind  them. 
They  start,  with  a  visual  inspection  of  the  data,  in  a  search  for  highly  “inho¬ 
mogeneous”  data,  in  the  (non-technical)  sense  that  one,  or  a  few  observations, 
dominate  the  rest.  This  is  generally  so  notable,  that  on  graphing  the  time  series, 
most  of  the  data  is  “squeezed”  onto  the  horizontal  axis  by  the  automatic  scaling 
of  the  plotting  routine. 

An  example  of  this  is  given  if  Figure  1,  where  plots  of  four  stable  time  series 
are  given.  Each  follows  the  AR(1)  model 

X,  -  (2.1) 

where  the  {Zt}  are  symmetric  i.i.d.  stables  with  scaling  parameter  a  =  1.  It  is 
obvious  that  all  three  of  the  stable  cases  are  qualitatively  different  to  the  final, 
Gaussian,  case. 


0  50  100  150  200 

(a) 

0  so  100 

ISO  200 

Figure  1:  Four  AR(1),  stable,  time  series,  with  increasing  values  of  a. 
(a)  a  =  0.6,  (b)  a  1,2,  (c)  a  =  1.8,  (d)  a  =  2.0 


Of  course,  in  each  of  these  cases,  it  would  be  hard  to  distinguish  on  a 
graphical  basis  between  a  purely  stable  series  and  a  Gaussian  series  with  the 
occasional  outlier  (cf.  the  examples  in  [MI]).  This  requires  looking  more  carefully 
at  further  distributional  information. 


2.2  The  histogram. 

Essentially  the  same  information  as  is  obtained  by  graphing  the  series  can  be 
gleaned  from  the  histogram  of  the  data.  What  is  lost  in  the  histogram  is,  of 
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course,  the  temporal  structure  of  the  data,  but  what  is  more  obvious  is  the 
presence  or  absence  of  symmetry. 

Figure  2  shows  a  histogram  from  data  generated  by  the  same  model  as  in 
(2.1),  but  now  only  for  two  cases,  a  =  2  (Gaussian)  and  a  =  1.6,  and  for  series 
of  length  1,000.  Two  factors  should  be  noted  here.  The  first  is  that,  despite  the 
fact  that  the  sample  size  is  quite  large,  the  Gaussian  case  is  much  further  from 
the  traditional  bell  curve  than  one  would  expect  with  i.i.d.  data.  But,  this  is 
correlated  data,  so  that  the  laws  of  large  numbers  take  longer  to  come  into  play. 

The  second  is  a  repetition  of  the  phenomenon  mentioned  above,  about  au¬ 
tomatic  scaling  “spoiling”  the  graph.  In  (b)  a  few  outliers  are  so  large  that  the 
entire  histogram  is  squeezed  into  a  few  bars  in  the  center.  When  the  largest 
and  the  smallest  5%  of  the  data  is  truncated,  as  in  (c),  the  shape  of  the  graph 
changes  dramaticallj'. 


Figure  2:  Histograms  of  (a)  Gaussian,  (b)  Stable,  a  =  1.6,  and  (c)  Truncated 
stable  time  series. 


2.3  The  “converging  variance”  test. 

One  of  the  oldest  tests  for  determining  whether  data  has  infinite  variance  is  the 
trick  of  plotting  the  sample  variance  5^,  based  on  the  first  n  observations,  as  a 
function  of  n.  If  the  data  comes  from  population  with  finite  variance,  5^  should 
converge  to  a  finite  value.  Otherwise,  it  should  diverge  as  n  grows,  and  the 
graph  typically  shows  large  jumps. 

Although  this  test  was  originally  designed  for  i.i.d.  data,  it  also  works  well 
for  correlated  data,  as  long  as  the  order  of  the  observations  is  first  randomised, 
so  as  to  destroy  dependencies  that  might  lead  to  trends  and  jumps  with  other 
explanations. 

Figure  3  contains  graphs  for  this  test  for  two  stable  {a  =  0.8, 1.6),  Gaus¬ 
sian,  and  “t”  (with  4  degrees  of  freedom)  processes.  (By  the  last  we  mean  an 
ARM  A  process  in  which  the  innovations  Zt  have  a  Student  t  distribution.  This 
is  an  interesting  case,  since  it  gives  a  distribution  with  much  heavier  than  Gaus¬ 
sian  tails,  but  in  the  Gaussian  domain  of  attraction.)  For  variety,  we  took  the 
ARMA(1,1)  model  Xt  -  0.3Xt_i  =  Zt  +  .5Zt-i,  with  series  of  length  1,000. 

The  divergence  of  S^,  as  n  grows,  and  the  irregularity  of  the  graphs  in  the 
two  stable  cases  are  very  marked. 
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Figure  3:  Cumulative  variance  plots  for  (a)  Gaussian,  (b)  a  =  0.8,  (c)  a  =  1.6 
and  (d)  Student  t  ARM  A  (1,1)  processes. 


2.4  Preliminary  estimation  of  the  stable  parameters. 

One  last,  and  natural,  step  before  entering  the  time  series  arena  proper,  is  the 
estimation  of  the  various  stable  parameters. 

There  are  a  number  of  techniques  available  for  estimating  tail  decay,  most 
of  which  are  built  around  the  so-called  “Hill  estimator”,  and  many  of  which, 
while  based  on  sound  theory,  turn  out  to  be  far  from  satisfactory  in  practice, 
(cf.  [Rl,  R2]  or  [RDM],  in  this  volume,  for  details.) 

This,  indeed,  is  one  of  the  reasons  for  being  prepared  to  assume  a  specifically 
stable  model,  rather  than  one  generically  heavy  tailed.  For  in  the  stable  case 
there  exists  an  excellent  estimator  of  the  parameters,  due  to  Hu  McCulloch 
[MC].  This  estimator,  which  is  based  in  essence  on  fitting  tabulated  quantiles 
of  stable  distributions,  works  for  a  €  [0.6, 2.0]  and  P  G  [—1,1]  (which  covers 
most  of  the  cases  met  in  practice)  and  all  values  of  the  other  parameters.  The 
estimator  of  the  location  parameter  however,  can  be  inaccurate  near  a  =  1. 

The  McCulloch  estimator  was  originally  designed  for,  and  indeed  works 
best  on,  i.i.d.  data.  Nevertheless,  some  initial  information  on  the  parameters, 
especially  a,  is  generally  required  for  model  identification,  so  that  one  has  no 
choice  but  to  work  with  the  time  series  data. 

A  typical  example  of  the  precision  of  the  McCulloch  estimator  of  a  is  given 
in  Table  1,  in  which  the  results  for  estimating  a  from  1,000  observations  from 
the  MA(2)  model  Xt  =  Zt-{-  -  .3Zt.>2  are  presented  for  various  a.  The 

accuracy  of  the  estimator  is,  we  believe,  truly  impressive. 

For  each  iteration  we  estimate  a  from  the  simulated  innovations,  and  then 
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from  the  time  series.  For  the  final  column,  we  estimated  the  MA  parameters 
using  Whittle’s  estimator,  described  in  Section  4,  and  computed  the  residuals. 
We  then  estimated  a  a  third  time  using  these  residuals,  expecting  (incorrectly) 
that  this  estimation  would  be  better  than  from  the  raw  time  series.  We  repeated 
this  process  10,000  times.  Clearly  the  estimates  obtained  from  the  innovations 
are  the  best,  but,  perhaps  rather  surprisingly,  McCulloch’s  technique  seems  to 
work  better  when  applied  to  the  original  time  series  rather  than  to  the  residuals. 


Estimation  of  alpha,  MA(2) 

a 

Innovations 

Time  series 

Residuals 

.6 

.8 

1 

1.2 

1.4 

1.6 

1.8 

2 

0.633  (0.0397) 
0.804  (0.0363) 
1.002  (0.0439) 
1.203  (0.0506) 
1.402  (0.0582) 
1.606  (0.0711) 
1.808  (0.0935) 
1.953  (0.0653) 

0.665  (0.0796) 
0.811  (0.0524) 
1.007  (0.0593) 
1.206  (0.0648) 
1.405  (0.0690) 
1.608  (0.0791) 
1.810  (0.0977) 
1.954  (0.0659) 

0.679  (0.1010) 
0.822  (0.0617) 
1.014  (0.0666) 
1.209  (0.0724) 
1.407  (0.0769) 
1.610  (0.0888) 
1.812  (0.1023) 
1.952  (0.0683) 

Table  1:  Mean  and  standard  deviation  (in  parentheses)  of  10,000  estimates  of 
alpha  using  simulated  innovation  sequence,  corresponding  time  series  and 

estimated  residuals. 


Interestingly,  however,  McCulloch’s  estimator  does  not  seem  to  work  as  well 
for  AR  processes  as  it  does  for  pure  moving  averages,  at  least  for  small  values 
of  a,  the  main  problem  being  in  the  substantially  increased  sample  variance, 
rather  than  in  the  bias. 

There  are  a  number  of  possible  explanations  for  this,  although  we  are  not 
certain  which,  if  any,  is  real.  Two  candidates  for  consideration  are: 

(i)  It  may  simply  be  due  to  divergences,  numerical  and  other,  as  a  becomes 
close  to  the  region  where  the  estimator  is  not  supposed  to  work. 

(ii)  In  the  Gaussian  case,  the  correlation  structure  is  much  stronger  in  the  AR(2) 
model  than  in  the  MA(2),  and  this  should  affect  estimator  variance.  Translating 
“correlation”  to  “dependence”  in  the  stable  case,  may  create  a  similar  problem. 

However,  when  estimating  a  from  the  residuals  obtained  from  the  Whittle 
parameter  estimates,  the  simulations  indicate  that  the  estimates  are  superior  in 
the  AR  case  (at  least  for  a>l). 

Table  2  gives  the  result  of  a  similar  study  for  the  model  Xt  —  .SXt-j  H- 
.7Xt.-2  =  Zt^ 

Note  that  these  positive  results  become  less  than  satisfactory  when  one 
leaves  the  permissible  parameter  region  for  the  estimator,  and  in  the  region  of 
a  —  1.  We  simulated  a  sample  of  length  1000  from  a  of  symmetric  Cauchy 
distribution  {a  =  I,  fx  0)  and  used  McCulloch’s  estimator  to  estimate  the 
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location  parameter  ji.  The  mean  of  50  of  such  independent  estimates  of  location 
was  -6.17. 

Similarly,  estimates  of  a  for  values  of  a  <  0.6  also  give  poor  and  misleading 
results.  For  example,  the  average  of  100  estimates  of  a  based  on  independent 
samples  of  1000  was  5.23  for  a  =  0.3,  and  0.95  for  a  0.5. 


Estimation  of  alpha,  AR(2) 

a 

Time  series 

Residuals 

.6 

0.836  (0.7122) 

0.790  (0.6434) 

.7 

0.758  (0.2976) 

0.740  (0.2822) 

.8 

0.843  (0.1040) 

0.818  (0.0702) 

1 

1.033  (0.0946) 

1.009  (0.0561) 

1.2 

1.213  (0.0975) 

1.207  (0.0544) 

1.4 

1.408  (0.1007) 

1.405  (0.0588) 

1.6 

1.612  (0.1044 

1.608  (0.0719) 

1.8 

1.813  (0.1079) 

1.811  (0.0941) 

2 

1.954  (0.0670) 

1.954  (0.0655) 

Table  2:  Mean  and  standard  deviation  (in  parentheses)  of  10,000  estimates  of 
alpha  using  simulated  time  series  and  estimated  residuals. 


3,  Model  Identification 


The  first  step  when  fitting  data  {Xi,X2,  - .  - ,  Xn}  into  a  linear  ARMA(p,  q) 
time  series  model 

Xt  —  (flXt^i  —  •  •  •  —  (f^Xt-p  Zt  A-  OiZt-l  -f - 1-  OqZt^q,  (3.1) 

with  i.i.d,  innovations  {Zt},  normal  or  stable,  is  the  identification  of  the  lag 
parameters  p  and  q. 

In  the  Gaussian  case  model  identification  techniques  are  based  on  analysis 
of  the  sample  autocorrelation  function  (ACF) 

n—h  n 

p{h)  =  ^  XtXt+h/  ^  2,  /i  =  1, 2, . . . ,  (3.2) 

t-1  t=l 

or  its  mean-corrected  version, 

n—h  n 

m  =  Y,{Xt  -  X){X,+H  -  X)l  Y,iXt  -  X)\  (3.3) 

t=l  t~l 

where  X  =  4-  ...  +  An),  and  of  the  equally  familiar  sample  partial 

autocorrelation  function  (PACF),  which,  to  make  some  details  below  easier  to 
follow,  we  define  in  detail; 
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Consider  the  AR(p)  model 

Xt  ~  (piXi^i  —  •  •  •  —  ^pXt—p  =  Zt, 

1  _  -  0p2P  i.  0,  l^l  <  1.  Let  =  [p(i  - 

be  the  p  x  p  matrix  of  ACFs  computed  under  a  finite  variance  assumption, 
p  ^  ^  p{p))\  0  (^1 , . ,  - ,  Then  the  Yule- Walker  matrix  equation 

is 

Rp<t>  =  P,  (3-5) 

and  the  Yule-WeJker  estimate  of  (p  is  then  defined  as  solution  of 

Rp^  =  P,  (3-6) 

where  Rp  =  [p{i  -  j)]i.;=i  and  p  =  (p(l), . . . ,  p(p))'. 

To  define  the  PACF  in  the  AR  case,  we  consider  vectors  <Pm  =  ■  ■  ■  ■> 

where  (p*  =  for  i  <  p  and  (p*i  =  0  when  i  >  p.  For  this  vector  we  write  Yule- 
Walker  equation  (3.5)  as  Rm<p*  =  Pm,  where  p„  =  (p{l), •  •  •  ,p(”^))-  The 
PACF  at  lag  m,  (p^m,  is  then  defined  as  the  m-th  component  of  the  vector 
(j)*  =  R^p.  Similarly,  the  sample  PACF  function  at  lag  m,  is  defined  as 
the  m-th  component  of  the  vector 

il  =  RZ'p-  P-7) 

In  the  heavy  tailed  case  the  ACF  and  PACF  do  not  exist,  but  we  can  still  use 
equations  (3.2)  -  (3.7)  to  define  their  sample  equivalents. 

In  the  general  ARMA  case,  the  PACF  at  lag  h  is  defined  as  the  sample 
correlation  between  the  residuals  of  Xt+k  nnd  Xt  after  linear  regression  (under 
a  finite  variance  assumption)  on  Xt+i , Xt+2 ,...,Xt+h-i- 

We  shall  use  the  notation  ^kk  for  the  PACF’s  when  the  centered  ACFs  p’s 
of  (3.3)  are  used  in  the  Yule- Walker  equation  (3.6)  instead  of  the  non-centered 
p’s  of  (3.2).  For  most  of  the  simulations  we  shall  prefer  to  use  the  centered 
variables,  and  so  will  need  to  assume  that  a  >  1. 

We  shall  see,  basically  via  simulation,  that  both  the  ACF  and  PACF  provide 
excellent  tools  for  studying  stable  time  series.  It  is  perhaps  rather  surprising 
that  although  second  moments  are  infinite  in  the  stable  case,  the  tools  that 
we  are  used  to  from  the  Caussian  case  are  still  available,  albeit  with  some 
modifications. 

The  following  subsection  sets  up  the  theory  underlying  the  use  of  the  ACF 
and  PACF.  We  then  provide  some  tables  for  hypothesis  testing,  followed  by 
two  sections  on  numerics.  The  final  subsection  looks  at  the  use  of  the  Akaike 
Information  Criterion  in  the  stable  setting. 

3.1  The  basic  theory. 

The  theoretical  basis  for  the  usage  of  the  sample  ACF  for  the  identification  of 
the  order  g  of  a  MA(q)  time  series  is  the  following  fundamental  result  of  [DR2] 
(we  follow  here  [BD]  Theorem  13.3.1): 
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Theorem  3.1  Let  {Zt}  he  an  iid  symmetric  sequence  of  a-stahle  random  vari¬ 
ables  and  let  {X*}  be  the  strictly  stationary  process , 


(3.8) 


J=~-00 


where 


<  oo  for  some  6  €  (0,  a)  O  [0, 1]. 

j=~oo 

Define  for  such  a  process  an  analogue  of  the  A  CF  function,  namely 

p{h)  =  Y  ft  =  1, 2, . . .  .  (3.9) 

j  3 

Then,  for  any  positive  integer  h, 

(n/  lnn)i/“(p(l)  -  p(l), . . .  ,p(ft)  -  p(ft))'  (Ya,  •  -  - ,  Yh)',  (3.10) 

where 

oo 

Yk=Y  +  j)  +  -  j)  -  2pOXft))  Sj/So.  (3.11) 

Here  5o,5i,...  are  independent  stable  variables;  Sq  is  positive  with  Sq  ~ 
5'a/2(C'^p^5  l,0)j  l^he  Sj  ^  5a(C'a^^",0,0)  where 


Ca  =  i  r(2-«)cos(2^)  “  7^  1 

I  I  ifa  =  l. 


(3.12) 


The  marginal  distribution  of  each  Yk  is  somewhat  simpler,  and  we  have 

f  CO  \ 

it  =  ( ]^  |p(ft  +  3)  +  -  j)  -  2p(jX*)l“  j  a fV,  (3.13) 

where  V  and  U  are  independent  stable  random  variables  with  the  same  distri¬ 
butions  as  So  dnd  S\  in  (3.11). 

When  a  >  1  then  (3.10)  is  also  true  when  p{h)  is  replaced  by  its  mean-corrected 
version,  p{h). 

We  consider  now  an  example  of  this  theorem  in  practice.  Let  Xt  be  the 
symmetric  stable  M.A{q)  process. 


Xt  ^  Zt  OiZt^i  +  •  •  •  +  6qZt...q. 


(3.14) 


Then  the  above  theorem  implies  that 
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where  the  right  hand  side  reduces  to  U/V  if  ^  —  0. 

Thus  one  can  use  the  above  results  to  plot  confidence  intervals  for  the  ACE 
function  and  identify  the  parameter  q,  once  the  distribution  of  U/V  is  known. 

A  similar  result  also  holds  for  the  distribution  of  the  PACE.  Since  p  ^  p 
and  Rp  A  Rp,  the  consistency  of  the  Yule- Walker  estimates  follows.  In  fact, 
the  mean  value  theorem  gives 


^  —  4>  =  D{p  —  p)  +  Op{p  —  p). 


where  D  is  the  p  x  p  matrix  of  partial  derivatives  of  vector  function 
i?p(z)-^z.  Here  Rp{z)  =  zq  =  I  B.nd  (p  =  ip{p).  Theorem  3.1  then 

yields  that 


(n/lnn)i/“(^  -  cf>)  =>  D{Yy, .  ..Yp)'  (3.16) 


where  the  vector  (1^,. .  -  Yp)  has  distribution  described  by  (3.11)  ■  (3.13). 

The  limiting  distribution  of  the  PACE’s  is  now  given  by  (3.16)  and  (3.11) 
-  (3.13),  and  so  is,  in  general,  rather  complicated.  However,  when  p  =  0,  the 
right  hand  side  of  (3.16)  reduces  to  U/V,  which  is  the  same  limit  as  for  the  ACE 
of  white  noise.  Since  in  the  null  hypothesis  case  this  is  what  is  required  to  test 
whidh  of  the  4>mm  are  zero,  we  are  in  the  fortunate  situation  of  being  able  to 
use  the  same  distribution  twice. 


3.2  Some  important  quantiles. 

In  practice,  the  distribution  of  U/V  cannot  be  computed  theoretically,  but  only 
via  simulation  or  numerical  integration  of  the  joint  density  of  the  vector  (U,  V) 
over  an  appropriate  region. 

Figure  4  gives  the  density  of  U /V  for  two  values  of  a.  Note  the  high  tails 
of  the  distribution,  which  are  high  even  in  relation  to  stable  distributions. 

Table  3  contains  the  97.5%  quantiles  of  U/V  (the  distribution  of  U/V  is 
symmetric)  which  for  o:  <C  2  were  found  via  simulation  of  500,000  values  of  U/V 
using  a  corrected  version  of  the  S-plus  routine  for  generating  stable  random 
variables,  (cf.  [ST]  p43.  The  S-plus  routine  does  not  quite  deliver  what  you 
might  expect  in  the  asymetric  case!)  Our  value  for  a  =  1  coincides  with  the 
value  quoted  in  [BD],  p541,  and  found  via  numerical  integration. 
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Figure  4:  Kernel  density  estimates  of  UjV  for  x  €  [’-’2,2]  and  a  =  1.5  and  1.8. 


Table  3:  97.5%  quantiles  of  UjV 

a 

97.5%  quantile 

a 

97.5%  quantile 

0.3 

9.338e+04 

1.3 

3.865e+00 

0.4 

4.625e+03 

1.4 

2.814e+00 

0.5 

7.375e+02 

1.5 

2.0596+00 

0.6 

1.986e+02 

1.6 

1.516e+00 

0.7 

7.745e4-01 

1.7 

1.096e+00 

0.8 

3.710e+01 

1.75 

9.280e-01 

0.9 

2.072e+01 

1.8 

7.637e-01 

1.0 

1.240e+01 

1.9 

4.765e-01 

1.1 

8.088e+00 

2.0 

1.960e+00 

1.2 

5.532e+00 

3.3  Estimating  the  lag  parameters  via  the  ACF  and  PACF. 

To  see  how  useful  the  above  results  are,  and  to  compare  them  to  an  attempt  to 
estimate  the  parameters  p  and  q  assuming  normality,  we  conducted  the  follow¬ 
ing,  rather  illuminating,  double  blind  study. 

We  simulated  100  time  series  of  length  n  =  1000  with  symmetric  stable  in¬ 
novations  with  a  =  1.2.  The  models  were  selected  at  random  from  the  following 
five  models: 


=  Zt-JZt-u 

Xt  - 


Zt  —  JZt-i  +  .8Zt_2, 
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Xt~.7Xt-j  -  Zt, 

X,  -  .7Xt-i -f- 

We  plotted  the  ACF  and  PACK  for  each  of  these  series,  and  then  used  these 
plots  to  try  to  identify  the  true  model  using  the  standard  time  series  technique 
of  looking  at  the  plots  and  seeing  how  they  behave  relative  to  the  95%  confidence 
intervals  under  a  white  noise  null  hypothesis.  Since  we  did  not  want  to  assume 
a  to  be  known,  we  used  confidence  intervals  corresponding  to  the  Gaussian 
{a  =  2)  and  Cauchy  (a  =  1)  cases.  (In  fact,  the  confidence  intervals  for  the 
true  a  =  1.2  and  Cauchy  cases  were  indistinguishable  to  the  human  eye.  The 
Gaussian  intervals,  however,  were  about  28%  shorter.)  The  conclusions  were 
then  compared  against  the  true  models. 

The  procedure  showed  31%  error  when  Gaussian  limits  were  used  and  17% 
error  using  Cauchy  limits.  Although  the  error  rate  clearly  depends  on  experience 
of  a  person  doing  identification,  it  is  clear  from  this  study  that  using  stable  limits 
for  heavy-tailed  data  reduces  the  error  rate  significantly. 

3.4  On  asymptotics  or  funny  thing  happenned  on  the  way  to  oo”. 

A  fact  often  touted  by  stable  time  series  theorists  as  a  compensation  for  the 
difficulties  generally  associated  with  stable  rather  than  Gaussian  analysis  is  that 
the  rate  of  convergence  of  p{h)  -  p{h)  to  zero  is  of  the  order  0([n/lnn]“^/^)  = 
for  all  P  >  which  is  considerably  faster  than  in  Gaussian  case  when 
the  rate  is  on  the  order  of 

However,  despite  this  comforting  fact,  there  are  some  other,  rather  problem¬ 
atic,  phenomena  associated  with  this  convergence,  since  the  rate  of  convergence 
of  the  distribution  of  p{h)  -  p{h)  to  the  limiting  distribution  is  actually  very 
slow. 

Before  we  mention  some  theory,  consider  Table  4,  which  indicates  how  fast 
(or  slow)  the  distribution  of  p(l)  converges  to  the  theoretical  one  for  the  white 
noise  model.  For  values  of  a  =  1.4,1,7,1.75,  and  1.8  and  sample  size  n  we 
computed  10,000  coefficients  p{l)  from  independent  white  noise  sequences  with 
corresponding  values  of  a  and  n  and  checked  the  percentage  of  times  the  co¬ 
efficient  was  not  within  the  nominal  95%  confidence  interval,  (Of  course,  this 
should  be  5%.)  For  determining  confidence  intervals  we  used  three  different 
distributions:  a  stable  distribution  with  the  correct  a  (as  described  above),  a 
Cauchy  distribution  and  Gaussian  distribution,  (i.e.  we  used  either  the  correct 
value  of  a,  or  behaved  as  if  a  =  1  or  2).  The  results  show  that  when  the  correct 
stable  distribution  is  used,  the  convergence  to  theoretical  5%  error  is  very  slow, 
and  that  for  “small”  sample  sizes  of  the  order  of  1000  the  Cauchy  based  limits 
actually  give  the  best  results. 

The  main  reason  behind  this  phenomenon  seems  to  be  the  slow  convergence 
of  the  distributions  of  stable  averages  to  their  limiting  distributions,  a  fact  that 
has  a  well  documented  history,  (cf.  [CH,  HA,  HW,  JM]  although  none  of  these 
quite  treats  our  setting.) 
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Table  4:  Percent  error  for  95%  confidence  interval  of  p(l), 

white  noise 

a 

n  =  1000 

n  =  10,000 

n  =  500,000 

n  =  1,000,000 

(i)  a-stable  limits 

1.4 

1.43 

2.28 

3.08 

3.27 

1.7 

4.37 

4.58 

4.68 

3.67 

1.75 

6.53 

5.35 

5.40 

5.60 

1.8 

10.55 

8.65 

7.44 

6.29 

(ii)  Cauchy  limits 

1.4 

1.24 

6.62 

32.27 

40.10 

1.7 

0.80 

15.87 

69.87 

73.95 

1.75 

0-70 

17.19 

73.07 

79-65 

1.8 

0.74 

19.79 

76.04 

80.69 

(iii)  Gaussian  limits 

1.4 

2.91 

2.17 

0.85 

0.89 

1.7 

3.46 

3.36 

2.14 

1.35 

1.75 

3.56 

3.23 

2.72 

2.52 

1.8 

4.22 

3.68 

2-90 

2.41 

^Prom  the  practical  point  of  view,  this  phenomenon  shows  up  in  an  inter¬ 
esting  way  in  the  numerics.  Figure  5  shows  the  values  of  the  upper  limits  of  the 
confidence  intervals  for  (p(l)  —  p(l))  for  various  a  and  n,  based  on  the  limiting 
distribution  of  Theorem  1  and  a  white  noise  time  series.  Note  how,  for  “small” 
n,  the  confidence  interval  shrinks  to  a  point  as  a  2. 

We  are  not  certain  of  the  reason  for  this.  One  possibility  is  that  for  small 
n  the  limiting  distribution  of  U/V  is  not  appropriate.  What  seems  more  rea¬ 
sonable,  however,  is  that  the  norming  constants  used  to  obtain  the  limiting 
distribution,  while  asymptotically  correct,  are  too  large  (in  an  a  dependent 
fashion)  for  small  and  intermediate  values  of  n. 

It  is  not  totally  clear  how  to  get  aroimd  this  problem  without  using,  perhaps, 
something  like  a  bootstrap.  However,  there  is  growing  evidence  that  in  the  stable 
situation  bootstrapping  is  also  problematic  (cf.  [LPPR]  in  this  volume.) 

One  possible  approach  comes  out  of  Table  5,  which  explores  the  n  =  1000 
case  for  different  a’s.  Our  model  is  again  white  noise,  and  we  record  the  per¬ 
centage  of  times  when  the  sample  coefficient  p(l)  is  outside  the  95%  confidence 
interval;  i.e.  the  percentage  of  wrong  identifications  of  the  model  when  the  iden¬ 
tification  procedure  is  done  by  computer  and  is  based  on  the  value  of  p(l)  only. 
The  confidence  levels  were  computed  based  on  the  true  a-stable  distribution,  as 
well  as  Cauchy  and  Gaussian  distributions.  The  number  of  simulations  for  each 
case  was  m  =  10, 000. 

It  seems  that  the  Gaussian  limits  perform  the  most  poorly,  at  least  for 
a  <  1.7  and  that  for  all  a  Cauchy  limits  not  only  perform  better  than  the 
others,  but  do  quite  well  on  an  absolute  scale.  (The  error  here  is  never  larger 
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Figure  5:  Confidence  intervals  for  (p(l)  -p(l))  for  1  <  a  <  2  and  various  values 
of  n.  (a)  n  =  1, 000,  (b)  n  =  5, 000,  (c)  n  =  500, 000,  (d)  n  =  1, 000, 000 

than  1.28%).  Furthermore,  for  small  a  there  is  no  significant  benefit  in  using 
the  confidence  limits  based  on  the  true  stable  distribution.  However,  for  large 
cc,  the  rate  of  the  rate  of  convergence  of  p(l)  to  its  theoretical  distribution  is 
so  slow  that  Cauchy  or  Gaussian  distribution  should  be  used.  Comparison  with 
Table  4  strengthens  this  point. 


Table  5:  Percent  error  for  95%  confidence  interval  of  p(l),  n  =  1000,  WN 

a 

true  a-stable  distribution 

Cauchy  limits 

Gaussian  limits 

1.0 

1.12 

1.12 

2.03 

1,1 

1.20 

1.28 

2.32 

1.2 

1.13 

1.22 

2.46 

1.3 

1.27 

1.23 

2.61 

1.4 

1.43 

1.24 

2.91 

1.5 

1.85 

1.18 

3.26 

1.6 

2.55 

1.19 

3.33 

1.7 

4.37 

0.80 

3.46 

1.8 

10.55 

0.74 

4.22 

1.9 

25.63 

0.42 

4.55 

2.0 

4.70 

0.77 

4.70 

In  Table  6  we  continue  the  theme  of  using  Cauchy  based  bounds,  regardless 
of  the  true  value  of  a.  While  this  clearly  gives  a  conservative  test,  it  turns 
out  that  in  practice  it  is  not  overly  so,  and  the  results  here  illustrate  how 
amazingly  well  the  Cauchy  bounds  perform  in  the  identification  of  the  MA(1) 


Analysing  stable  time  series 


15 


model  Xt  =  Zt  —  SZt-i.  Again,  each  specific  case  was  run  m  =  10, 000  times.  In 
each  run  for  sample  size  n  =  1000  and  given  a  we  computed  p(l), . . . ,  p(10)  and 
recorded  an  ^^error”  when  p(l)  was  within  the  confidence  limits  for  white  noise  or 
when  p{l)  was  outside  these  limits  but  one  of  the  coefficients  p(2), . , . ,  p(10)  was 
also  outside  the  limits,  so  identification  as  MA(1)  would  not  be  called  for.  The 
confidence  limits  were  taken  to  be  (i)  {lnn/ny/^(U/V)  with  U  and  V  coming 
from  the  true  distribution,  (ii)  as  in  (i)  but  with  U  and  V  corresponding  to 
Cauchy  distribution  and  with  a  =  1,  (iii)  1.96/ y/n,  (iv)  according  to  Bartlett’s 
formula  for  MA(1)  model  with  true  value  of  p(l).  The  whole  procedure  was 
performed  by  computer  without  human  intervention. 

it  is  clear,  although  perhaps  somewhat  surprising,  that  Cauchy  limits  work 
the  best  and  that  the  identification  procedure  works  better  for  heavy-tailed 
series  than  for  their  finite  variance  counterparts. 


Table  6:  Percent  of  wrong  identifications  of  MA(1)  model 

a 

true  a-stable 

Cauchy 

Gaussian 

Bartlett 

1.0 

10.31 

10.31 

17.36 

12.97 

1.1 

11.61 

12.13 

20.36 

14.81 

1.2 

11.48 

12.08 

22.15 

15.38 

1.3 

12.96 

12.44 

25.10 

16.64 

1.4 

14.89 

12.53 

27.75 

17.80 

1.5 

20.33 

13.80 

31.99 

19.71 

1.6 

28.19 

13.60 

35.52 

20.80 

1.7 

45.59 

13.50 

40.33 

22.32 

1.8 

68.55 

13.85 

43.49 

23.95 

1.9 

91.59 

14.86 

48.39 

26.17 

2.0 

52.59 

17.21 

52.59 

29.73 

We  close  this  subsection  with  a  some  brief  information  on  the  asymptotic 
behaviour  of  the  PACF,  which,  not  surprisingly,  is  similar  to  that  of  the  ACF. 
Table  7  is  a  PACF  version  of  Table  5,  generated  in  the  same  fashion,  however 
with  data  on  rather  than  p{l)  being  tabulated. 
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Table  7;  Percent  error  for  95%  confidence  interval  of  ^ii,n  =  1000,  WN 

a 

true  a-stable  distribution 

Cauchy  limits 

Gaussian  limits 

1,0 

1,07 

1,07 

1-93 

1.1 

1.27 

1,30 

2.39 

1.2 

1.13 

1.25 

2.44 

1.3 

1.43 

1.36 

2.62 

1.4 

1.44 

1.20 

3.11 

1.5 

1.87 

1.30 

3.23 

1.6 

2.45 

1.08 

3.33 

1.7 

4.64 

0.84 

3.86 

1.8 

10.52 

0.70 

4.35 

1.9 

25.93 

0.47 

4.13 

2.0 

4.84 

0.83 

4.84 

3.5  The  AIC  criterion. 

In  the  finite  variance  case  the  prime  criterion  for  automated  model  selection 
recommended  by  [BD]  is  the  AICC,  a  modified  version  of  Akaike’s  AIC  crite¬ 
rion.  An  investigation  of  the  AIC  criterion  in  the  infinite  variance  situation  was 
carried  out  by  [BH]  jmd  [KNl]. 

For  an  autoregressive  model  the  AIC  statistics  are  defined  by 

AIC{k)=nlncr^{k)  +  2k, 

where  n  is  the  sample  size,  and  {k)  is  the  estimate  of  the  innovation  variance 
obtained  from  Yule- Walker  estimates  for  A:-th  order  autoregressive  sequence  (cf. 
(3.6)).  Then 

p  =  argminfc<^  AJC(A:), 

where  AT  is  an  acceptable  upper  bound  for  p,  is  the  corresponding  estimate  of 
the  order  p.  [KNl]  showed  that  this  procedure  is  consistent  for  heavy  tailed 
situations. 

To  see  how  the  AIC  criterion  works  in  practice  for  stable  AR  series  we 
performed  1000  simulations  of  the  AR(1)  model 

Xt  =  0.4Xt_i  -b  Zt,  (3.17) 

for  a  =  0.5, 1.0,  1.75,  2.0.  The  sample  size  was  fixed  at  n  =  200,  which  is  rather 
small  by  stable  standards,  where  larger  sample  sizes  are  required  than  in  the 
Gaussian  case. 

In  each  run  we  assumed  that  we  were  looking  for  the  best  AR  model;  i.e. 
we  looked  for  order  p  which  minimised  the  AIC  criterion.  We  then  recorded  the 
number  of  times  that  the  correct  order  (p  =  1)  was  correctly  identified. 

To  check  how  the  AIC  criterion  works  for  a  MA  model,  we  used  the  S-plus 
arima.mle  routine  which  defines  AIC  as  (—2)  times  the  log  Gaussian  likelihood 
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plus  two  times  the  number  of  parameters  fit  (see  the  definition  in  [BD],  p.  304 
or  consult  the  S-plus  manual).  We  performed  1000  runs  for  model: 

Ai  —  Zt  A-  O.SZi^i  (3.18) 

with  a  =  0.5,  1.0,  1.75,  2.0,  n  —  200.  Again,  in  each  run  we  assumed  that 
we  were  looking  for  the  best  MA  model,  and  recorded  the  number  of  correct 
identifications-  The  results  of  simulations  for  both  models  are  given  in  Table 
8  and  the  conclusion  is  that  in  both  cases  the  AIC  criterion  works  better  the 
heavier  the  tails! 

The  MA  case  is  particularly  interesting,  since  we  have  no  theoretical  justi¬ 
fication  for  applying  the  AIC  based  on  the  Gaussian  likelihood  to  heavy-tailed 
data. 


4.  Parameter  estimation  for  ARMA  models:  The  Whittle  estimator. 


Paper  [CD]  of  this  volume  gives  an  extensive  review  of  estimation  techniques 
for  linear  processes  with  stable  innovations.  They  present  convincing  evidence 
to  the  effect  that  LAD  and  MLE  estimators  are  superior,  in  the  heavy  tailed 
setting,  to  the  estimators  traditionally  used  in  finite  variance  time  series,  such 
as  least  square  or  Whittle  (periodogram)  estimators.  However,  for  both  of  these 
cases,  the  sampling  distribution  of  the  parameter  estimates  is,  at  least  at  the 
moment,  numerically  as  well  as  theoretically  intractable. 


Table  8:  Percent  of  correct  model  identifications  made  by  AIC. 

a 

AR(1)  -  (3.17) 

MA(1)-  (3.18) 

0.5 

96.1 

92.5 

1.0 

89.6 

89.0 

1.75 

78.7 

79.7 

2.0 

73.2 

72.5 

However,  for  the  Whittle  estimator,  we  have  the  following  result  of  [MGKAj: 

Theorem  4.1  Let  {ATt}  be  a  causal^  invertible,  a-stable,  ARMA(p,q)  process, 
with  parameters  P  =  (<^i, . . .  . . .  ,0^)'.  Let  C  denote  the  space  of  per- 

missable  parameter  values:  i.e.  C  =  {)9  G  :  <j>{z)  •  6{z)  ^  0  for  \z\  <  1  and 
(^(•),  6{^)  have  no  common  zeroes}.  Define  the  polynomials 


(l){z)  =  1  -  <i>iz  -  ...  -  <f)pzP,  e(z)  =  1  4-  -h  . . .  +  Ogz"^ 
and,  for  all  — tt  <  A  <  tt,  the  “power  transfer  function” 


(4.1) 


5(A;;8)  = 


(4.2) 
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Denote  the  self-normalized  periodogram  by 


-at! 


Er=a? 


— TT  <  A  <  TT, 


(4.3) 


The  periodogram,  or  “Whittle”  estimator  (3n  of  the  true,  but  unknown,  parameter 
pQ,  is  found  by  minimizing 


(4.4) 


with  respect  to  P  eC,  where  the  sum  is  taken  over  all  frequencies  Xj  =  211  j/n  € 
(— tTjTt].  Then, 


{Bn  -  Po)  =>  47riy-i(/3o)^  Skbk, 

it=i 


(4.5) 


where  the  5o,5i,...  are  as  in  Theorem  1,  W-'^{Po)  is  the  inverse  of  the  matrix 

^-,T 

dX, 


W{po)=  [ 

J  — TT 


\dlng{X-,Po) 


dp 


din  g{X]Po) 


dp 


and 


27r 

where  denotes  the  reciprocal  of  g. 


dp 


Self-normalization  of  the  periodogram,  as  in  (4.3),  is  essential  for  the  proof 
of  the  above  Theorem.  However,  in  practice,  in  order  to  find  the  estimator 
Pn  we  minimize  expression  (4.4)  when  self-normalized  periodogram  In,x{>^)  is 
replaced  by 


In,xi^) 


t=i 


(4.6) 


One  of  the  useful  aspects  of  this  result  is  that  the  asymptotic  sample  dis¬ 
tribution  involved  here  is  closely  related  to  that  which  arises  in  the  study  of 
the  ACF  and  PACF,  so  that  simple  extensions  of  the  numerics  required  to  find 
confidence  intervals  there  also  work  here. 

To  see  how  close  the  theoretical  asymptotic  distribution  of  Whittle  estima¬ 
tor  is  to  the  true  sample  distribution  when  n  is  fixed,  we  simulated  data  using 
the  AR(1)  model  (3.17).  We  ran  three  different  sample  sizes  n  =  200,  1000 
and  10, 000  and  two  values  of  a  =  1.5,  1.75.  For  comparison  we  also  simulated 
Gaussian  data  and  used  confidence  intervals  for  the  MLE,  the  asymptotic  dis¬ 
tribution  of  which  coincides  with  the  distribution  of  Whittle  estimates  in  the 
finite  variance  case.  We  ran  m  =  10, 000  simulations  for  each  particular  case. 

The  results,  described  in  Table  9  and  on  Figure  6,  show  that  the  confidence 
intervals  based  on  the  limiting  distribution  (4.5)  work  very  well  even  for  the 
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relatively  small  sample  size  of  n  ~  200.  The  entries  in  Table  9  are  the  percent 
of  estimates  =  {$i}n  which  fall  into  the  95%  confidence  interval  centered 
around  the  true  parameter  value.  Although,  the  numbers  in  the  Gaussian  row 
are  the  closest  to  theoretical  95%,  the  results  for  a  =  1.75  are  impressive  and 
unexpected,  given  the  slow  convergence  rate  of  the  distributions  of  the  sample 
ACF  and  PACF.  Figure  6  gives  histograms  for  the  Whittle  estimate  {^i}n  for 
a  =  1.75  and  the  three  different  sample  sizes  used.  The  vertical  bars  represent 
95%  confidence  levels. 


Table  9;  Percent  of  {(f>i}n 

s  within  95%  confidence  interval;  model  (3.17) 

a 

sample  size  n=200 

sample  size  n~1000 

sample  size  n— 10000 

1.5 

99,35 

98.48 

96.41 

1.75 

92.71 

93.76 

95.08 

2 

94.77 

94.55 

94.89 

Figure  6:  Histograms  and  theoretical  95%  confidence  intervals  for  Whittle  esti¬ 
mate  {^i}n  of  <f)i  in  model  (3.17),  (a)  n  =  200,  (b)  n  =  1000,  (c)  n  =  10,000. 


5.  Diagnostic  checking 


After  identifying  and  estimating  the  parameters  of  times  series  model,  it  is 
always  nice  (although  sometimes  disconcerting)  to  see  if  the  estimated  model 
is  really  a  good  fit  to  the  data.  This  is  where  diagnostic  checking  procedures 
come  in.  This  usually  involves  identifying  the  residuals  and  seeing  how  well 
they  match  the  distribution  originally  assumed  for  the  innovations.  In  our  case, 
this  involves  checking  whether  the  residuals  are  i.i.d.  under  the  assumption  of 
an  SaS  distribution. 

For  infinite  variance  series  we  recommend  the  four  following  steps; 

(i)  Graph  the  residuals;  the  pattern  should  follow  a  white  noise  model. 

(ii)  Check  that  the  ACF  and  PACF  of  the  residuals  are  those  of  white  noise. 
The  results  and  recommendations  of  Section  3  all  apply  here. 

(iii)  The  Durbin- Watson  test  of  [PL]; 

(iv)  Various  non-parametric  tests,  see  [BD],  p.312-313. 

We  ran  a  small  simulation  to  see  how  well  the  second  of  these  techniques  ac¬ 
tually  worked.  For  models  (3.17)  and  (3.18),  stable  parameters  a  =  1.5,  1.75,  2, 


20 


Adler,  Feldman  and  Gallagher 


and  sample  sizes  ti  —  200,  1000,  10,000  we  simulated  data,  identified  a  model 
using  the  techniques  of  Section  3,  estimated  parameters  (using  the  Whittle  es¬ 
timator)  and  then  checked  whether  the  residuals  were  consistent  with  stable 
white  noise  model. 


Table  10:  Diagnostic  checking  based  on  ACF/PACF;  models  (3.17)-(3.18) 

Model 

n 

Ident. 

Pn 

Diag.  Check  (C) 

Diag.  Check(G)  | 

a  =  1.5 

(3.17) 

200 

AR(1)  (C) 

.355 

WN 

NOT  WN 

(3.18) 

200 

MA(1)  (C)^ 

.867 

WN 

NOT  WN 

(3.17) 

1000 

AR(1)(C)^ 

.388 

WN 

WN 

(3.18) 

1000 

MA(1)  (C) 

.785 

WN 

WN 

(3.17) 

10000 

AR(1)  (G) 

.402 

N/A 

WN 

(3.18) 

10000 

MA(1)  (G) 

.793 

N/A 

WN 

a  =  1.75 

(3.17) 

200 

AR(1)  (C) 

.388 

WN 

WN 

(3.18) 

200 

MA(1)  (C)2 

.835 

WN 

NOT  WN 

(3.17) 

1000 

AR(1)  (C) 

.400 

WN 

WN 

(3.18) 

1000 

MA(1)  (C)^ 

.820 

WN 

NOT  WN 

(3.17) 

10000 

AR(1)(G) 

.404 

N/A 

WN 

(3.18) 

10000 

MA(1)  (G) 

.797 

N/A 

NOT  WN 

a  =  2.00 

(3.17) 

200 

AR(1)  (G) 

.443 

WN 

WN 

(3.18) 

200 

MA(1)  (C)5 

.767 

WN 

WN 

(3.17) 

1000 

AR(1)  (G) 

.398 

WN 

WN 

(3.18) 

1000 

MA(1)  (C) 

.822 

WN 

WN 

(3.17) 

10000 

AR(1)  (G) 

.425 

N/A 

NOT  WN 

(3.18) 

10000 

MA(1)  (G) 

.801 

N/A 

WN 

1  Gaussian  and  true  1.5-stable  limits  indicate  MA(ll).  AIG  indicates  MA(1). 

2  Cauchy  limits  indicate  AR(1)  or  ARMA(1,1).  AIG  indicates  AR(1). 

*  Gaussian  and  true  l.75-stable  limits  indicate  MA(5).  AIG  indicates  MA(1). 
^  Gaussian  limits  indicate  MA(5).  True  1.75-stable  limits  indicate  possibly 
MA(8).  AIG  indicates  MA(1)  or  MA(5)(the  value  is  almost  the  same).  For 
reasons  of  parsimony  we  chose  MA(1). 

®  True  Gaussian  limits  indicate  MA(13),  while  Cauchy  limits  indicate  correct 
model  MA(1).  AIG  indicates  MA(1). 

Model  identification  was  based  on  ACF/PACF  analysis,  unless  more  than 
one  model  seemed  acceptable,  in  which  case  we  differentiated  between  the  mod¬ 
els  via  the  AIC.  For  the  “small”  sample  sizes  of  n  =  200,  1000  we  used  Cauchy 
based  confidence  limits  for  the  ACF/PACF  analysis,  while  for  n  =  10,000  we 
used  Gaussian  limits.  In  fitting  the  residuals,  with  sample  sizes  n  =  200,  1000 


Analysing  stable  time  series 


21 


we  used  both  Cauchy  and  Gaussian  limits. 

The  results  summarized  in  Table  10  show  that  diagnostic  checking  based 
on  ACF/PACF  works  as  well  in  stable  case  as  in  the  Gaussian.  The  letters 
(C)  or  (G)  in  the  table  indicate  that,  respectively,  Cauchy  or  Gaussian  based 
confidence  limits  were  used.  The  coefficient  estimate  Pn  is  in  the  AR(1) 
(3.17)  case,  and  §i  in  the  MA{1)  (3.18)  case. 

As  a  guide  to  reading  the  table,  consider  the  first  line,  which  indicates 
that  an  AR(1)  series  of  length  n  =  200  and  stable  innovations  with  a  =  1.5  was 
generated  according  to  the  model  (3.17).  It  was  correctly  identified  as  an  AR(1) 
model,  using  Cauchy  confidence  intervals  for  the  ACF  and  PACF.  The  estimate 
of  the  the  AR  coefficient  was  .355.  (The  actual  value  was  0.4;  cf.  (3.17).)  The 
residuals  were  then  investigated.  Using  Cauchy  confidence  intervals  they  were 
judged  to  be  white,  which  was  not  the  case  with  Gaussian  confidence  intervals. 

Although  we  have  not  studied  it,  we  note  that  a  stable  analogue  of  the 
Durbin- Watson  statistic  has  been  developed  in  [PL].  Since  this  statistic  essen¬ 
tially  checks  that  the  ACF  of  the  residual  sequence  at  lag  1  is  that  of  white 
noise,  its  asymptotic  distribution  is  given  by  Theorem  3,1. 

Unfortunately  no  other,  stronger,  tools  (such  as  the  f^^st  in  the  finite 
variance  case)  are  currently  available  for  diagnostic  checking  in  the  stable  situ¬ 
ation.  This  would  seem  to  be  a  promising  and  important  direction  for  further 
research. 
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Appendix:  McCulloch’s  quantile  estimator  of  stable  parameters 


In  this  section  we  describe  the  estimator  developed  by  McCulloch  ([MC]) 
for  the  indices  a  (of  stability)  and  /?  (of  skewness)  of  a  stable  distribution,  which 
have  been  referred  to  in  the  body  of  the  paper. 

Let  X  and  denote  the  p-th  quantile  of  this  distribution  by 

Xp.  McCulloch’s  estimator  uses  five  quantiles  to  estimate  a  €  [0.6, 2.0]  and 
/?  G  [—1, 1],  and  is  structured  as  follows: 

Set 


^i(o:,  P) 


X  95  —  X  05 


^2{(y,P) 


X 95  4-  Xqs  —  2X.5Q 
X  95  —  X  05 


Since  is  monotonic  in  a  and  <^2  is  monotonic  in  P  (for  fixed  a)  we  can  we 
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can  invert  these  functions  to  obtain 


a  = 


0  = 

McCulloch  tabulated  $1  and  '^2  for  various  values  of  $1  and  $2- 

To  form  an  estimator,  take  a  random  sample  from  a  stable  distribution 
and  define  #1  and  $2  by  replacing  the  quantiles  Xk  by  the  corresponding  sam¬ 
ple  quantiles  Xk-  Since  the  sample  quantiles  are  consistent  for  the  population 
quantiles,  #1  and  $2  are  consistent  estimators  of  $1  and  #2-  Define 

a  =  «'i(I>i,42), 


,0  =  ^'2(^1,^2)- 

Given  a  random  sample  from  a  stable  distribution,  we  can  now  use  the  tables 
to  find  d  and  p. 

To  obtain  estimates  of  the  scale  and  location  parameters,  McCulloch  defined 
similar  functions  using  these  same  5  quantiles,  which  were  also  tabulated  for 
various  scale  and  location  values.  These  tables  can  be  used  in  a  similar  fashion 
so  as  to  obtain  estimates  for  fx  and  for  cr. 
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